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Low complexity method for large-scale self-consistent ab initio electronic-structure 

calculations without localization 
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A novel low complexity method to perform self-consistent electronic-structure calculations using 
the Kohn-Sham formalism of density functional theory is presented. Localization constraints are 
neither imposed nor required thereby allowing direct comparison with conventional cubically scaling 
algorithms. The method has, to date, the lowest complexity of any algorithm for an exact calculation. 
A simple one-dimensional model system is used to thoroughly test the numerical stability of the 
algorithm and results for a real physical system are also given. 
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Low complexity electronic-structure methods Il5| . us- 
ing the Kohn-Sham density functional approach [l|, 
where the operation count scales with respect to sys- 
tem size (W-scaling') as A''" where a < 2 have been 
around for a few decades. A comprehensive review of low 
complexity methods is given in reference fs]. Contrary 
to what is often reported the theoretical upper bound 
for the A^-scaling of an exact self-consistent algorithm 
has been set at 0{N'^) ever since Fermi operator ex- 
pansion (FOE) methods were developed 3. This letter 
shows how the theoretical upper bound for the scaling 
of such calculations can be lowered to 0{a{d, N)N~d—'^ 
where d is the dimensionality of repetition of a full three- 
dimensional system and a{l,N) = log2(A^), a{2,N) < 2 
and a{3,N) < 4/3. For large scale calculations low 
complexity algorithms are without doubt the future of 
electronic-structure implementations. However, low com- 
plexity ab initio algorithms are not in common usage at 
the moment, primarily due to two main reasons. Firstly, 
much of the work currently being carried out deals with 
systems that are too small to be amenable to low com- 
plexity approaches if high accuracy is desired. Secondly, 
low complexity algorithms are not yet fully functional 
and fully stable for general systems - so a sufficient level 
of confidence in using these codes has not been estab- 
lished. While the first reason is rapidly being diminished 
due to the ever reducing cost of a floating point opera- 
tion, the second may prove to be far more stubborn. 

Most low complexity algorithms fall broadly into two 
categories; either they attempt to calculate localized or- 
bitals or they seek to evaluate the density matrix (DM) 
directly. For a general system only the latter is known 
to provide a low complexity solution. In the case of a 
metal, for example, delocalized states at the Fermi level 
prevent the occupied subspace being represented in terms 
of orthogonal localized orbitals. 

Problems associated with low complexity approaches 
commonly stem from the imposition of a priori local- 
ization constraints. The effect of this restriction varies 
depending on the algorithm and physical system. In or- 
bital minimization algorithms even the initial guess can 



alter the obtained solution. In some cases localization 
will always cast a degree of doubt over the final answers 
(except in the simplest wide-gap systems), and in oth- 
ers prohibits obtaining the relevant physics/chemistry all 
together. Fermi operator expansion (FOE) algorithms 
(either using a polynomial [J Q or rational @ approx- 
imation) for systems with a DM localized in real-space 
provide arguably the most natural and foolproof way of 
obtaining results in 0{N). In these methods the locality 
does not necessarily have to be imposed a priori, rather 
the system can be allowed to inform us of the locality 
in a systematic way. Methods that impose unsystematic 
localization are invariably open to more doubt. While 
a great deal of progress has been made in understand- 
ing the inherent locality present in many systems, low 
temperature metallic systems and charged insulating sys- 
tems with long-ranged DM correlations are still a signif- 
icant challenge. The method presented in this letter is 
primarily aimed at such systems. However, it has also 
been noted that the onset of sparsity of the DM, even for 
wide-gap systems, is 'discouragingly slow' [3| especially 
if high accuracy is required. The main advantage of the 
method in this work is that it relies purely on the locality 
of the basis functions allowing the use of non-orthogonal 
localized basis sets, such as Gaussians, with rather less 
localized orthogonal and dual complements. Also, the 
full DM need not be explicitly calculated. 

The energy renormalization group (ERG) approach [^, 
0, [l3| is a beautiful and elegant concept that has also 
been suggested to cope with such difficult problems. In 
an ideal implementation it may be possible for its scaling 
to better the method given here for d > 1 and equal 
it for d = 1. However, it remains unclear whether an 
ERG algorithm can also provide the density in an efficient 
manner and to some extent the ERG method employs 
cutoffs. Therefore, the ERG method will not be included 
in the definition of FOE methods in the following. 

To date, standard FOE methods have been considered 
to scale quadratically for systems where the DM decay 
length is of the order of the system size. This can be the 
case for very large systems especially for metals at low 
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temperature or if high accuracy is required. The method 
presented here imposes no locahzation constraints and 
scales as 0{a{d, N)N~d~ ) where a{d, N) is a weak loga- 
rithmic factor for d = 1 and tends to a constant in higher 
dimensions. Not only does this represent a new theoreti- 
cal upper bound for the A'^-scaling of an exact algorithm 
(upto the basis set limit) it is also expected to make a sig- 
nificant and immediate impact on systems of low dimen- 
sionality. Furthermore, for d = 1 it can be implemented 
using exclusively standard direct linear algebra routines 
(eg. LAPACK) for the bulk of the computation. This 
is because a d = 1 Hamiltonian (with zero or periodic 
boundary conditions) can always be arranged so that it 
is a banded matrix, with a bandwidth that is indepen- 
dent of system size, if it is constructed from localized 
basis functions. 

We now turn to what will be referred to as the recursive 
bisection density matrix (RBDM) algorithm. We begin 
with a rational approximation of the density matrix [l6| 
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^uJk{H - Zk) ^ uJk^ZkE 
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where P and fi are the inverse temperature and Fermi 
energy respectively. The inverses of the shifted Hamilto- 
nians in equation ([T]) may be evaluated by solving linear 
equations. A number of methods to construct such ratio- 
nal approximations have previously appeared in the lit- 
erature [ill, m, [13. For a given temperature, the condi- 
tion of the shifted matrices is asymptotically independent 
of system size. Therefore, if no localization of the DM 
can be taken advantage of the solution of each equation 
requires 0{N) operations. Since we must solve 0{N) 
equations the overall scaling is 0{N'^) - as stated previ- 
ously. A key point is that to calculate the band-structure 
energy and density 
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using a localized basis set only requires elements of 
the DM that lie within the sparsity pattern of the Hamil- 
tonian. The inverse of such a shifted matrix is clearly 
symmetric as 

[H - zk)-^ - cAc^, Ay = %/(A, - zfc) (3) 
m ZkY'f = {ckc^f = [(Ac^)^c^] = cAc^ (4) 

where {A^} and c are the eigenvalues and eigenvectors of 
H respectively. For simplicity the Hamiltonian matrix H 
in equations is taken to be constructed from an 

orthogonal basis. Generalization to the non-orthogonal 
case simply requires replacing (H — Zk) with {H — ZkS) 
throughout and noting that the eigenvectors in equations 
([3]) and (HI satisfy Sc = I where S is the overlap matrix 
of the basis functions. 
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FIG. 1: Schematic of bisection of matrix inverses for d = 1. 
The broad diagonal line represents the band of the matrix. 
The narrow vertical lines represent the columns of the matrix 
inverses {H—Zk)~^ that are calculated. The dashed horizontal 
lines are rows that are known from the calculated columns 
due to the matrix being symmetric. These rows then specify 
boundary conditions for smaller sets of independent linear 
equations at each sweep (a-d). 



We may then proceed with a recursive bisection of 
the matrix approach without approximation. The eas- 
iest way to demonstrate this principle is to see how one 
can obtain the density for a d — I system, such as a lin- 
ear molecule or carbon nanotube. For such a system the 
Hamiltonian is a banded matrix. The width of the band, 
although independent of system size, is implementation 
and system specific. Therefore, for the sake of clarity 
a truly one-dimensional system will be considered. The 
simplest Hamiltonian we can imagine is a finite-difference 
stencil representing the Laplacian and the local potential 
represented on a grid of spacing h 



Hu = l/h^ + V{x,) 
H,, = -l/{2h'),\i-j\ = l 
= 0,|z-j| > 1. 



(5) 



As this matrix is tridiagonal, a submatrix (on the diag- 
onal) of H requires two boundary points to determine 
the linear equation (-ffsub ^ Zk)x = b. Fig. [1] shows a 
schematic of the RBDM strategy for d = 1. After the 
first sweep (Fig 1 (a)) the first, central and last columns 
are known. From the rows (known as the matrix is sym- 
metric) we now have boundary conditions of two smaller 
problems which can be solved independently (Fig 1 (b)). 
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We may then bisect these two subproblems in a similar 
fashion (Fig 1 (c)). The process continues until the di- 
mensions of the submatrices are comparable to the band- 
width of the matrix (Fig 1(d)), and then direct evalua- 
tion can be used for the remaining subproblems (smallest 
blocks on the diagonal in Fig 1(d)). 

We now turn to the scaling of the method for 1 > d < 
3. We start with a cubic system and imagine increas- 
ing the size of the system by a factor 7 in d dimensions 
thereby increasing the total size of the system by 7'*. 
Firstly, we consider only the cost of the first bisection 
(Fig 1 (a)) of the system and we consider the DM to 
have effectively infinite range. To bisect the system into 
two subsystems requires calculating tt-coI columns (rep- 
resented by vertical lines in Fig 1) of the DM and each 
column requires 0{N) operations to compute. As the 
system size is increased 7^''~^''7icoi columns are required 
to bisect the system. Therefore, the first sweep scales 
as 0(7V(2''-i)/'^) - and this is the leading term. This 
bisection operation must then be repeated until all of 
the desired elements of the DM have been calculated. 
The number of bisections required goes like log2d(A^). 
The number of operations required to perform sweep m 
(to > 1) is - 7V;^/2('^-i)("-i) where iVi is the number 
of operations to perform the first sweep. Therefore, the 
total number of operations may be written as 



2.5e-06 



iVtot OC iV(2'i-l)/'i 



1 



X! 2(d-l)(m- 
m=0 



(6) 



For d = 1 the summation is clearly proportional to 
log2 N . However, in higher dimensions the summation 
is a convergent series and gives 2 for d — 2 and 4/3 for 
d = 3. This is an upper bound for the number of opera- 
tions. Elaborate bisection schemes may reduce the total 
number of operations but the leading scaling with N will 
not be affected. Hamiltonians with broader bands from 
the use of more extended basis functions or non-local 
pseudopotentials require an increase in ricoi, however, this 
does not affect the A^-scaling. 

Another important aspect of any algorithm is numer- 
ical stability. As many elements of the DM rely on pre- 
vious solutions of linear equations we may expect errors 
to accumulate the more bisections we use. It is difficult 
to gauge the precise effect on the total energy, however 
we may concentrate on a single inverse and assume the 
worst case scenario. If we take one of our shifted matri- 
ces that is closest to being singular (the matrix shifted 
closest to the Fermi energy) {H — Zc) then the error in 
solving for one column of the matrix is proportional to 
einK{H — Zc) where em and k are machine precision and 
condition of the matrix respectively. At worse we may 
expect the error to grow linearly with the bisection num- 
ber, though a random- walk accumulation leading to a 
square root dependence is more realistic. Fig. 2 shows 
this slow drift in the value of Tr{H{H — ^c)^^) where 
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Number of bisections 

FIG. 2: Relative error in Tr{H{H — Zc)~^) (single precision) 
compared to the case where no bisections were used (solid 
line). The condition of {H — Zc)~^ is of the order lO''. For 
such an ill-conditioned matrix even the relative error in single 
precision direct diagonalization was ~ 10~^. The dashed line 
shows a fit of the square root of the number of bisections. 



{H — Zc)^^ is a very ill conditioned matrix (certainly as 
ill-conditioned as any in a realistic electronic structure 
calculation). However, each submatrix will have eigen- 
value range similar to that of the full matrix but a less 
clustered eigenspectrum. This will render sM&-linear sys- 
tems becoming further from singularity during the bi- 
section process. The numerics in a full calculation are 
clearly very complex. One-dimensional model systems 
were extensively tested in single precision, including dou- 
ble precision iterative improvement of the solutions, from 
a range of ill-conditioned matrices. In some cases increas- 
ing the bisection number produced results closer to that 
of double precision diagonalization and no catastrophic 
numerical instabilities were detected. 

As a final example we take a more physically realis- 
tic Hamiltonian. A minimal Gaussian basis was used 
to construct Hamiltonian and overlap matrices for linear 
CnH2n-h2 molcculcs using a norm-conserving non-local 
pseudopotential lj|. To obtain a physically reasonable 
eigenspectrum using the minimal basis for this molecule 
requires basis functions with a spatial extent which cor- 
responds to the bandwidth of the matrix being approxi- 
mately 50. This corresponds to a chain length of around 
8 carbon atoms before the bandwidth of the matrix be- 
comes less than the dimension of the matrix. For test- 
ing purposes a low temperature (~ 0.04eV ) Fermi dis- 
tribution distribution with /i taken to be an eigenvalue 
in the valence band was chosen. This corresponds to a 
highly charged insulating system with a long range DM 
(Fig. 3) and also provides an ill-conditioned problem 
ideal to test numerical stability. The absolute/relative 
error, compared to direct diagonalization, for the 1001 
atom C333H668 was ^ 10~^°/10~^^ and 5 bisections were 
required. This further puts into context the numerical 
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FIG. 3: Decay of the central column of the density matrix 
(squares) and two inverses of shifted Hamiltonians for the 
highly charged C333H668 system. The Hamiltonian closest 
to the Fermi energy (crosses) and the Hamiltonian shifted 
furthest from the real axis (triangles). 



drift mentioned in the previous section. No iterative im- 
provement was used in this example, only full double 
precision arithmetic, and the ill-conditioning of the lin- 
ear systems represents the worst case in a typical calcu- 
lation. Therefore, in a realistic calculation, chain lengths 
containing at least one million basis functions in one- 
dimension (and more in higher dimensions) should be ac- 
cessible (by which point the natural decay of the density 
matrix will surely limit the number of required bisections 
in any case). 

We now discuss some further implementation issues. 
For large systematic basis sets the memory required to 
store the boundary conditions may become prohibitive 
- especially in three dimensions. The method can over- 
come this to some extent by bisecting the system by a 
factor, g, greater than two and building up the density 
matrix in segments. However, when using large basis 
sets, a smaller filtered set of basis functions expanded 
in terms of the underlying basis would be a more real- 
istic approach. It can now be clearly seen how conven- 
tional linear algebra can be used for d — 1 systems. A 
banded matrix can be LU factorized in 0{N) operations 
and a linear equation solved in 0{N) using direct meth- 
ods. Therefore, for d — 1 iterative algorithms need not 
be considered - this is useful when using localized basis 
functions such as Gaussians where iterative methods are 
still difficult to precondition. Also, the matrices shifted 
close to at low temperature, become close to singular 
therefore even basis sets that can be readily precondi- 
tioned in a conventional sense (by damping of high ki- 
netic energy components) will also suffer in this regime, 
so direct methods are desirable. As solving sparse linear 
systems of equations forms the kernel of the method it 
is naturally open to any advances in direct sparse solvers 
for systems where d > \. 



In principle, a similar procedure can be used if one opts 
for a polynomial, rather than a rational, approximation 
to the Fermi function. If F{H) is approximated by a 
polynomial in iJ, F{H) ~ J^k'' ^kH^ : we may construct 
a set of columns of H^[k = 2, ...,np] and store the nec- 
essary boundary matrix elements for each fc in a similar 
fashion to that already described above. 

Even if a system has a DM that is sufficiently localized 
to take advantage of the RBDM method can still be used 
to dramatically reduce the prefactor if the localization 
regions are significantly larger than the spatial extent of 
the basis functions. This will often be the case if highly 
accurate relative energies are desired. Also, the inverses 
of Hamiltonians shifted far from the real-axis have more 
rapid decay allowing true 0{N) evaluation (Fig. 3). 

In conclusion, a simple modification of FOE meth- 
ods has been presented allowing 0{a{d, N)N^^) scal- 
ing where a(l, N) = log2(iV), a(2, N) <2 and a(3, N) < 
4/3 without the need for localization. This is a especially 
useful for systems of low dimensionality with long-ranged 
DM correlations. 
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